15 research outputs found
Recommended from our members
Predictive Complexity Priors
Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning
Bayesian batch active learning as sparse subset approximation
Leveraging the wealth of unlabeled data produced in recent years provides
great potential for improving supervised models. When the cost of acquiring
labels is high, probabilistic active learning methods can be used to greedily
select the most informative data points to be labeled. However, for many
large-scale problems standard greedy procedures become computationally
infeasible and suffer from negligible model change. In this paper, we introduce
a novel Bayesian batch active learning approach that mitigates these issues.
Our approach is motivated by approximating the complete data posterior of the
model parameters. While naive batch construction methods result in correlated
queries, our algorithm produces diverse batches that enable efficient active
learning at scale. We derive interpretable closed-form solutions akin to
existing active learning procedures for linear models, and generalize to
arbitrary models using random projections. We demonstrate the benefits of our
approach on several large-scale regression and classification tasks.Comment: NeurIPS 201
Enhancing VAEs for Collaborative Filtering: Flexible Priors & Gating Mechanisms
Neural network based models for collaborative filtering have started to gain
attention recently. One branch of research is based on using deep generative
models to model user preferences where variational autoencoders were shown to
produce state-of-the-art results. However, there are some potentially
problematic characteristics of the current variational autoencoder for CF. The
first is the too simplistic prior that VAEs incorporate for learning the latent
representations of user preference. The other is the model's inability to learn
deeper representations with more than one hidden layer for each network. Our
goal is to incorporate appropriate techniques to mitigate the aforementioned
problems of variational autoencoder CF and further improve the recommendation
performance. Our work is the first to apply flexible priors to collaborative
filtering and show that simple priors (in original VAEs) may be too restrictive
to fully model user preferences and setting a more flexible prior gives
significant gains. We experiment with the VampPrior, originally proposed for
image generation, to examine the effect of flexible priors in CF. We also show
that VampPriors coupled with gating mechanisms outperform SOTA results
including the Variational Autoencoder for Collaborative Filtering by meaningful
margins on 2 popular benchmark datasets (MovieLens & Netflix)
Calibrated Learning to Defer with One-vs-All Classifiers
The learning to defer (L2D) framework has the potential to make AI systems
safer. For a given input, the system can defer the decision to a human if the
human is more likely than the model to take the correct action. We study the
calibration of L2D systems, investigating if the probabilities they output are
sound. We find that Mozannar & Sontag's (2020) multiclass framework is not
calibrated with respect to expert correctness. Moreover, it is not even
guaranteed to produce valid probabilities due to its parameterization being
degenerate for this purpose. We propose an L2D system based on one-vs-all
classifiers that is able to produce calibrated probabilities of expert
correctness. Furthermore, our loss function is also a consistent surrogate for
multiclass L2D, like Mozannar & Sontag's (2020). Our experiments verify that
not only is our system calibrated, but this benefit comes at no cost to
accuracy. Our model's accuracy is always comparable (and often superior) to
Mozannar & Sontag's (2020) model's in tasks ranging from hate speech detection
to galaxy classification to diagnosis of skin lesions.Comment: Accepted at the International Conference on Machine Learning (ICML),
202
Dropout as a structured shrinkage prior
Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior automatic depth determination as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks
Predictive Complexity Priors
Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning
Bayesian batch active learning as sparse subset approximation
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks